p-wasserstein distance
Comments relevant to all reviewers: 2
We thank the reviewers for their interest in our work and their helpful comments. Please find our response below. DDPG and TD3, by keeping an exploration strategy which does not decay to zero. Gradient methods to bridge the gap between DPO and GAC. Reviewer 3: Thank you for pointing out some confusing explanations, we will make sure to clarify them in the paper.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (3 more...)
Certifying Robustness via Topological Representations
Agerberg, Jens, Guidolin, Andrea, Martinelli, Andrea, Hoefgeest, Pepijn Roos, Eklund, David, Scolamiero, Martina
In machine learning, the ability to obtain data representations that capture underlying geometrical and topological structures of data spaces is crucial. A common approach in Topological Data Analysis to extract multi-scale intrinsic geometric properties of data is persistent homology (PH) (Carlsson, 2009). As a rich descriptor of geometry, PH has been used in machine learning pipelines in areas such as bioinformatics, neuroscience and material science (Dindin et al., 2020; Colombo et al., 2022; Lee et al., 2017). The key difference of PH compared to other methods in Geometric Deep Learning is perhaps the emphasis of theoretical stability results: PH is a Lipschitz function, with known Lipschitz constants, with respect to appropriate metrics on data and representation space (Cohen-Steiner et al., 2005; Skraba and Turner, 2020). However, composing the PH pipeline with a neural network presents challenges with respect to the stability of the representations thus learned: they may lose stability or the stability may become insignificant in practice in case PH representations are composed with neural networks that have large Lipschitz constants. Moreover, the constant of the neural network may be difficult to compute or to control. While robustness to noise of PH-machine learning pipelines has been studied empirically (Turkeš et al., 2021), we formulate the problem in the framework of adversarial learning and propose a neural network that can learn stable and discriminative geometric representations from persistence. Our contributions may be summarized as follows: We propose the Stable Rank Network (SRN), a neural network architecture taking PH as input, where the learned representations enjoy a Lipschitz property w.r.t.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
- Europe > Sweden (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
A New Robust Partial $p$-Wasserstein-Based Metric for Comparing Distributions
Raghvendra, Sharath, Shirzadian, Pouyan, Zhang, Kaiyi
The $2$-Wasserstein distance is sensitive to minor geometric differences between distributions, making it a very powerful dissimilarity metric. However, due to this sensitivity, a small outlier mass can also cause a significant increase in the $2$-Wasserstein distance between two similar distributions. Similarly, sampling discrepancy can cause the empirical $2$-Wasserstein distance on $n$ samples in $\mathbb{R}^2$ to converge to the true distance at a rate of $n^{-1/4}$, which is significantly slower than the rate of $n^{-1/2}$ for $1$-Wasserstein distance. We introduce a new family of distances parameterized by $k \ge 0$, called $k$-RPW that is based on computing the partial $2$-Wasserstein distance. We show that (1) $k$-RPW satisfies the metric properties, (2) $k$-RPW is robust to small outlier mass while retaining the sensitivity of $2$-Wasserstein distance to minor geometric differences, and (3) when $k$ is a constant, $k$-RPW distance between empirical distributions on $n$ samples in $\mathbb{R}^2$ converges to the true distance at a rate of $n^{-1/3}$, which is faster than the convergence rate of $n^{-1/4}$ for the $2$-Wasserstein distance. Using the partial $p$-Wasserstein distance, we extend our distance to any $p \in [1,\infty]$. By setting parameters $k$ or $p$ appropriately, we can reduce our distance to the total variation, $p$-Wasserstein, and the L\'evy-Prokhorov distances. Experiments show that our distance function achieves higher accuracy in comparison to the $1$-Wasserstein, $2$-Wasserstein, and TV distances for image retrieval tasks on noisy real-world data sets.
- North America > United States > Virginia (0.04)
- North America > United States > North Carolina (0.04)
Neural approximation of Wasserstein distance via a universal architecture for symmetric and factorwise group invariant functions
Learning distance functions between complex objects, such as the Wasserstein distance to compare point sets, is a common goal in machine learning applications. However, functions on such complex objects (e.g., point sets and graphs) are often required to be invariant to a wide variety of group actions e.g. permutation or rigid transformation. Therefore, continuous and symmetric product functions (such as distance functions) on such complex objects must also be invariant to the product of such group actions. We call these functions symmetric and factor-wise group invariant (or SFGI functions in short). In this paper, we first present a general neural network architecture for approximating SFGI functions. The main contribution of this paper combines this general neural network with a sketching idea to develop a specific and efficient neural network which can approximate the $p$-th Wasserstein distance between point sets. Very importantly, the required model complexity is independent of the sizes of input point sets. On the theoretical front, to the best of our knowledge, this is the first result showing that there exists a neural network with the capacity to approximate Wasserstein distance with bounded model complexity. Our work provides an interesting integration of sketching ideas for geometric problems with universal approximation of symmetric functions. On the empirical front, we present a range of results showing that our newly proposed neural network architecture performs comparatively or better than other models (including a SOTA Siamese Autoencoder based approach). In particular, our neural network generalizes significantly better and trains much faster than the SOTA Siamese AE. Finally, this line of investigation could be useful in exploring effective neural network design for solving a broad range of geometric optimization problems (e.g., $k$-means in a metric space).
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
Training generative models from privatized data
Reshetova, Daria, Chen, Wei-Ning, Özgür, Ayfer
Local differential privacy (LDP) is a powerful method for privacy-preserving data collection. In this paper, we develop a framework for training Generative Adversarial Networks (GAN) on differentially privatized data. We show that entropic regularization of the Wasserstein distance -- a popular regularization method in the literature that has been often leveraged for its computational benefits -- can be used to denoise the data distribution when data is privatized by common additive noise mechanisms, such as Laplace and Gaussian. This combination uniquely enables the mitigation of both the regularization bias and the effects of privatization noise, thereby enhancing the overall efficacy of the model. We analyse the proposed method, provide sample complexity results and experimental evidence to support its efficacy.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
- Asia > Russia (0.04)
Minimum Wasserstein Distance Estimator under Finite Location-scale Mixtures
When a population exhibits heterogeneity, we often model it via a finite mixture: decompose it into several different but homogeneous subpopulations. Contemporary practice favors learning the mixtures by maximizing the likelihood for statistical efficiency and the convenient EM-algorithm for numerical computation. Yet the maximum likelihood estimate (MLE) is not well defined for the most widely used finite normal mixture in particular and for finite location-scale mixture in general. We hence investigate feasible alternatives to MLE such as minimum distance estimators. Recently, the Wasserstein distance has drawn increased attention in the machine learning community. It has intuitive geometric interpretation and is successfully employed in many new applications. Do we gain anything by learning finite location-scale mixtures via a minimum Wasserstein distance estimator (MWDE)? This paper investigates this possibility in several respects. We find that the MWDE is consistent and derive a numerical solution under finite location-scale mixtures. We study its robustness against outliers and mild model mis-specifications. Our moderate scaled simulation study shows the MWDE suffers some efficiency loss against a penalized version of MLE in general without noticeable gain in robustness. We reaffirm the general superiority of the likelihood based learning strategies even for the non-regular finite location-scale mixtures.
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Sliced Iterative Generator
We introduce the Sliced Iterative Generator (SIG), an iterative generative model that is a Normalizing Flow (NF), but shares the advantages of Generative Adversarial Networks (GANs). The model is based on iterative Optimal Transport of a series of 1D slices through the data space, matching on each slice the probability distribution function (PDF) of the samples to the data. To improve the efficiency, the directions of the orthogonal slices are chosen to maximize the PDF difference between the generated samples and the data using Wasserstein distance at each iteration. A patch based approach is adopted to model the images in a hierarchical way, enabling the model to scale well to high dimensions. Unlike GANs, SIG has a NF structure and allows efficient likelihood evaluations that can be used in downstream tasks. We show that SIG is capable of generating realistic, high dimensional samples of images, achieving state of the art FID scores on MNIST and Fashion MNIST without any dimensionality reduction. It also has good Out of Distribution detection properties using the likelihood. To the best of our knowledge, SIG is the first iterative (greedy) deep learning algorithm that is competitive with the state of the art non-iterative generators in high dimensions. While SIG has a deep neural network architecture, the approach deviates significantly from the current deep learning paradigm, as it does not use concepts such as mini-batching, stochastic gradient descent, gradient back-propagation through deep layers, or non-convex loss function optimization. SIG is very insensitive to hyper-parameter tuning, making it a useful generator tool for ML experts and non-experts alike.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Kazakhstan > West Kazakhstan Region (0.04)
- Asia > China > Beijing > Beijing (0.04)
(q,p)-Wasserstein GANs: Comparing Ground Metrics for Wasserstein GANs
Mallasto, Anton, Frellsen, Jes, Boomsma, Wouter, Feragen, Aasa
Generative Adversial Networks (GANs) have made a major impact in computer vision and machine learning as generative models. Wasserstein GANs (WGANs) brought Optimal Transport (OT) theory into GANs, by minimizing the $1$-Wasserstein distance between model and data distributions as their objective function. Since then, WGANs have gained considerable interest due to their stability and theoretical framework. We contribute to the WGAN literature by introducing the family of $(q,p)$-Wasserstein GANs, which allow the use of more general $p$-Wasserstein metrics for $p\geq 1$ in the GAN learning procedure. While the method is able to incorporate any cost function as the ground metric, we focus on studying the $l^q$ metrics for $q\geq 1$. This is a notable generalization as in the WGAN literature the OT distances are commonly based on the $l^2$ ground metric. We demonstrate the effect of different $p$-Wasserstein distances in two toy examples. Furthermore, we show that the ground metric does make a difference, by comparing different $(q,p)$ pairs on the MNIST and CIFAR-10 datasets. Our experiments demonstrate that changing the ground metric and $p$ can notably improve on the common $(q,p) = (2,1)$ case.